We are using a two-component hurdle model: first, the model predicts whether a disease will be present (binary), and if present, it predicts the case count (integer). Here we compare the results of a boosted tree model to our baseline model.
| .metric | desc | model | full_model |
|---|---|---|---|
| accuracy | proportion of the data that are predicted correctly | baseline | 0.81 |
| xgboost | 0.89 | ||
| kap | similar measure to accuracy(), but is normalized by the accuracy that would be expected by chance alone and is very useful when one or more classes have large frequency distributions. | baseline | 0.25 |
| xgboost | 0.67 | ||
| sens | the proportion of disease absent predictions out of the number of events which were actually absent | baseline | 0.99 |
| xgboost | 0.98 | ||
| spec | the proportion of disease present predictions out of the number of events which were actually present | baseline | 0.19 |
| xgboost | 0.62 |
| .metric | model | birds | buffaloes | camelidae | cats | cattle | cervidae | dogs | equidae | hares/rabbits | sheep/goats | swine |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accuracy | baseline | 0.82 | 0.710 | 0.760 | 0.70 | 0.80 | 0.680 | 0.68 | 0.88 | 0.820 | 0.82 | 0.84 |
| xgboost | 0.88 | 0.870 | 0.870 | 0.88 | 0.88 | 0.860 | 0.87 | 0.92 | 0.880 | 0.90 | 0.90 | |
| kap | baseline | 0.23 | 0.089 | 0.083 | 0.19 | 0.34 | 0.041 | 0.26 | 0.26 | 0.083 | 0.26 | 0.25 |
| xgboost | 0.61 | 0.690 | 0.640 | 0.73 | 0.68 | 0.690 | 0.72 | 0.62 | 0.510 | 0.67 | 0.64 | |
| sens | baseline | 0.99 | 1.000 | 1.000 | 1.00 | 0.99 | 1.000 | 0.99 | 1.00 | 0.990 | 0.99 | 0.99 |
| xgboost | 0.97 | 0.970 | 0.980 | 0.96 | 0.97 | 0.940 | 0.91 | 0.99 | 0.990 | 0.98 | 0.98 | |
| spec | baseline | 0.17 | 0.067 | 0.059 | 0.15 | 0.27 | 0.031 | 0.24 | 0.17 | 0.060 | 0.19 | 0.18 |
| xgboost | 0.56 | 0.680 | 0.590 | 0.74 | 0.67 | 0.720 | 0.80 | 0.50 | 0.420 | 0.62 | 0.57 |
| .metric | model | Africa | Americas | Asia | Europe | Oceania |
|---|---|---|---|---|---|---|
| accuracy | baseline | 0.80 | 0.76 | 0.81 | 0.84 | 0.890 |
| xgboost | 0.88 | 0.87 | 0.90 | 0.90 | 0.940 | |
| kap | baseline | 0.29 | 0.21 | 0.26 | 0.27 | 0.034 |
| xgboost | 0.64 | 0.69 | 0.71 | 0.62 | 0.570 | |
| sens | baseline | 0.99 | 1.00 | 0.99 | 0.99 | 1.000 |
| xgboost | 0.97 | 0.97 | 0.97 | 0.98 | 1.000 | |
| spec | baseline | 0.23 | 0.16 | 0.19 | 0.20 | 0.019 |
| xgboost | 0.60 | 0.67 | 0.68 | 0.56 | 0.450 |
| .metric | desc | model | full_model |
|---|---|---|---|
| accuracy | proportion of the data that are predicted correctly | baseline | 0.81 |
| xgboost | 0.89 | ||
| kap | similar measure to accuracy(), but is normalized by the accuracy that would be expected by chance alone and is very useful when one or more classes have large frequency distributions. | baseline | -0.01 |
| xgboost | 0.45 |
Note there are some baseline “outbreak ends” predictions. This occurs in cases where the lag1 disease status is 1, but the lag1 cases are 0 or NA. The predict() function predicts lag1 cases only when the lag1 disease status is 1.
| .metric | model | birds | buffaloes | camelidae | cats | cattle | cervidae | dogs | equidae | hares/rabbits | sheep/goats | swine |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accuracy | baseline | 0.820 | 0.710 | 0.760 | 0.700 | 0.800 | 0.680 | 0.680 | 0.880 | 0.8200 | 0.8200 | 0.8400 |
| xgboost | 0.880 | 0.870 | 0.870 | 0.880 | 0.880 | 0.860 | 0.870 | 0.920 | 0.8800 | 0.9000 | 0.9000 | |
| kap | baseline | -0.016 | -0.044 | -0.013 | -0.056 | -0.015 | -0.017 | -0.027 | 0.011 | 0.0029 | -0.0084 | 0.0023 |
| xgboost | 0.300 | 0.550 | 0.470 | 0.610 | 0.450 | 0.590 | 0.630 | 0.350 | 0.2400 | 0.4600 | 0.4100 |
| .metric | model | Africa | Americas | Asia | Europe | Oceania |
|---|---|---|---|---|---|---|
| accuracy | baseline | 0.8000 | 0.760 | 0.810 | 0.840 | 0.890 |
| xgboost | 0.8800 | 0.870 | 0.900 | 0.900 | 0.940 | |
| kap | baseline | 0.0011 | -0.028 | -0.023 | 0.011 | -0.017 |
| xgboost | 0.3900 | 0.500 | 0.510 | 0.380 | 0.270 |
Here we evaluate the subset of the training data with positive case counts
cases model stats
## # A tibble: 6 x 4
## model .metric .estimator .estimate
## <chr> <chr> <chr> <dbl>
## 1 baseline rmse standard 485129.
## 2 xgboost rmse standard 393058.
## 3 baseline rsq standard 0.0179
## 4 xgboost rsq standard 0.122
## 5 baseline mae standard 7545.
## 6 xgboost mae standard 5853.